Let’s first start with installing the diffusers library, which will be used for the entire notebook for image generation.

Show the code

!pip install -Uq diffusers transformers fastcore

Starting With Stable Diffusion

To do anything with Stable Diffusion, we need to first accept their model license and login. You can get more details on model cards^[1]. Once logged-in, go to https://huggingface.co/settings/tokens and create new token. If you don’t already have a token, you can create a new token with name “notebooks” and “Write” permissions (In case you’re planning to login through notebook just liked I did below). This will save your token in the home directory /Users/<admin>/.cache/huggingface/token (In Mac). You can also login through terminal using the huggingface-cli login command, which will have the same effect as notebook_login

Segue

[1]. Model cards simply provides the details on the model, right from the contributer to setup and training the model

Show the code

import logging
from pathlib import Path

import matplotlib.pyplot as plt
import torch
from diffusers import StableDiffusionPipeline
from fastcore.all import concat
from huggingface_hub import notebook_login
from PIL import Image

logging.disable(logging.WARNING)

torch.manual_seed(1)
if not (Path.home()/'.huggingface'/'token').exists(): notebook_login()

The Stable Diffusion Pipeline

Stable Diffusion has something called a Pipeline^[1]. If you’re familiar with fast.ai, this is similar to what we call as fastai learner. The pipeline basically contains all the models, processing, inferencing, etc. One can save the pipeline, into the huggingface cloud (also called Hub). Learn more about diffusion inference pipeline^[2].

You can start generating images with very few lines of code. One just need to provide the pre-trained model as repo-id present in the huggingface repo or a path t odirectory containing pipeline weights, train further and generate images of your choice. You can also save your own pipeline to the hub, for other people to use.

Segue

[1]. Many Hugging Face libraries (along with other libraries such as scikit-learn) use the concept of a “pipeline” to indicate a sequence of steps that when combined complete some task.

[2]. Inference means, using the model to generate output (i.e., images here), as opposed to training (or fine-tuning) models using new data.

Show the code

DEVICE='mps'

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True).to(DEVICE)

from_pretrained is used to create, and download the pipeline with pre-defined weights. This will download a few GBs of data. The weights are cached the first time we run the cell.
fp16 also called half-precision floating point-numbers (i.e., float16). It occupies 16 bits (two bytes in modern computers) in computer memory. Read more here.
pipe.to("mps") is used, if you’re working on a Mac machine with M1/M2 chip.

We use from_pretrained to create the pipeline and download the pretrained weights. We indicate that we want to use the fp16 (half-precision) version of the weights, and we tell diffusers to expect the weights in that format. This allows us to perform much faster inference with almost no discernible difference in quality.

Show the code

!!ls ~/.cache/huggingface/hub

['\x1b[1m\x1b[36mmodels--CompVis--stable-diffusion-v1-4\x1b[m\x1b[m',
 'version.txt',
 'version_diffusers_cache.txt']

Show the code

pipe.enable_attention_slicing()

Show the code

prompt = "a photograph of an astronaut riding a tiger"

Show the code

pipe(prompt).images[0]

KeyboardInterrupt:

Show the code

_ = pipe(prompt, num_inference_steps=1)

Show the code

torch.__version__

'2.0.0'

Show the code

torch.has_mps

True

Show the code

device = "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps else "cpu"
if torch.has_mps:
    pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True).to(device)
else:
    pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True).to(device)

/Users/kashmkj/micromamba/envs/fastai/lib/python3.11/site-packages/transformers/models/clip/feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(

Show the code

image = pipe("An astronaught scuba diving").images[0]

Show the code

device = "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps else "cpu"
if torch.has_mps:
    pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4").to(device)

Show the code

!ls ~/.cache/huggingface/hub

models--CompVis--stable-diffusion-v1-4 version_diffusers_cache.txt
version.txt

Show the code

pipe.enable_attention_slicing()

Show the code

prompt = "a photograph of an astronaut riding a horse"

Show the code

pipe(prompt).images[0]

Show the code

torch.manual_seed(1024)

<torch._C.Generator at 0x127d48810>

Show the code

pipe(prompt).images[0]

Show the code

pipe.enable_attention_slicing()
pipe(prompt, num_inference_steps=3).images[0]

Show the code

pipe

StableDiffusionPipeline {
  "_class_name": "StableDiffusionPipeline",
  "_diffusers_version": "0.18.2",
  "feature_extractor": [
    "transformers",
    "CLIPImageProcessor"
  ],
  "requires_safety_checker": true,
  "safety_checker": [
    "stable_diffusion",
    "StableDiffusionSafetyChecker"
  ],
  "scheduler": [
    "diffusers",
    "PNDMScheduler"
  ],
  "text_encoder": [
    "transformers",
    "CLIPTextModel"
  ],
  "tokenizer": [
    "transformers",
    "CLIPTokenizer"
  ],
  "unet": [
    "diffusers",
    "UNet2DConditionModel"
  ],
  "vae": [
    "diffusers",
    "AutoencoderKL"
  ]
}